Search - text extraction

[Linux-Unix] Xpdf_Linux_Unix

Description: 用于Linux平台下读写PDF文件的接口，运行在X-Windows环境。该模块还包含了PDF 文本提取，PDF-to-PostScript转换-For the Linux platform interface to read and write PDF files, to run in the X-Windows environment. The module also includes a PDF text extraction, PDF-to-PostScript conversion
Platform: | Size: 3567616 | Author: Jason | Hits:

[Graph Recognize] zimo221

Description: 字模提取软件，可以提取文字，图片以及动画的字模，感觉挺方便的。-Font extraction software can extract text, images and animations of the font, feeling quite convenient.
Platform: | Size: 262144 | Author: 刘鹏 | Hits:

[Mathimatics-Numerical algorithms] SIFTtutorial

Description: 图像特征提取以及匹配，sift代码。可用于图片检索中，同比SURF，其算法可以较好的识别图像中文字-Image feature extraction and matching, sift the code. Can be used for image retrieval, up by SURF, the algorithm can better identify the image text
Platform: | Size: 25486336 | Author: sandy | Hits:

[Graph Recognize] DImageProcess

Description: 文字提取不错的VC++程序。带有测试图片和测试结果图片。-a wonderful VC++ program in text extraction
Platform: | Size: 907264 | Author: 李宗州 | Hits:

[Search Engine] joyhtml-0.2.2

Description: html正文提取，利用匹配来进行正文的抽取-html text extraction, the use of matching to carry out the extraction of the body
Platform: | Size: 18214912 | Author: yxt | Hits:

[Other] K-PageSearch

Description: 功能特点多线程网络蜘蛛网页定向采集多语言网页编码自动识别哈希表网页去重智能网页正文抽取基于词库的智能中文分词中文分词词库管理海量数据毫秒级全文检索缓存技术网页快照高级搜索竞价排名网络蜘蛛-Features multi-threaded web spider web oriented multi-language Web page collection automatic identification code to re-hash table pages smart page smart text extraction thesaurus-based Chinese word Chinese word lexicon manage the vast amounts of data millisecond Search Advanced caching technology Cached PPC search Web Spider
Platform: | Size: 3360768 | Author: 洋洋 | Hits:

[Special Effects] Text_Feature_Extraction

Description: 文本特征提取方法研究。文本的表示及其特征项的选取是文本挖掘、信息检索的一个基本问题，它把从文本中抽取出的特征词进行量化来表示文本信息。-Text Feature Extraction. And characteristics of the text of that item selection is text mining, information retrieval is a basic problem, which to extract from the text to quantify the characteristics of words to represent text information.
Platform: | Size: 43008 | Author: 延春生 | Hits:

[JSP/Java] DocumentExtractor

Description: 整合了网上开源项目的资源，实现了对office 文档，pdf文档以及html文件的文本抽取，为搜索引擎的实现提供了文本资源-Integration of online resources for open source projects, realized on office documents, pdf documents and html files of text extraction, as the search engine text resources provided for the realization
Platform: | Size: 13574144 | Author: lufaxu | Hits:

[WEB Code] html-extractor

Description: 发布一个HTML正文提取程序HTMLExtractor，程序主要是基于内容统计的方法，暂不包含自学习能力，仅是一个分析程序而以，网上也有别人实现了的正文提取程序，不过大部人都当宝，都不愿意公开完整代码，有些大人实现了一些简单的，不过分析能力和识别能力都不太理想。所以自己做了一个简单的，本来想用PHP DOM分析器，不过大部份网页都不规范，缺个标签啥的都很正常，所以自已又造了个简单的轮子分析HTML标签，功能比较简单，每个元素都生成一个对象，内存方面占用比较高，不过在这里我只是为了实现，并没去做优化。因为我并不是在做应用，所以希望不要让我改改成什么样去适用你们的业务（以前经常有QQ加上让我把我的例子怎么改，很无语），如果你们喜欢，可以和我一起开发完善他。补充一下，因为写的着急，现在几个类的耦合性还比较大，下来再守善吧。项目代码 http://code.google.com/p/html-extractor/ 在线例子 http://dev.psm01.cn/c/html-extractor.php-HTML text extraction procedure to release a HTMLExtractor, Program is mainly based on the content of statistical methods, including self-learning capability temporarily, only An analytical procedure to, the Internet also has the body of someone else realized the extraction process, but When the treasure most people are reluctant to open the complete code, some adults to achieve a number of simple Single, but analysis and recognition are not ideal. So do yourself a Simple, had wanted to use PHP DOM parser, but most of the pages are not standardized, Han s missing tags are normal, so their own and made the wheels of a simple HTML standards Sign, function is relatively simple, each element generates an object, the memory area occupied by comparison High, but I m just here to achieve, it did not do optimization. Because I am not Do apply, so I hope I do not what to change into for your business (before the regular I had QQ with examples of how to change my very silent), If you p
Platform: | Size: 5120 | Author: 小徐 | Hits:

[Software Engineering] 1PMULTISCALEPEDGE-BASEDPTEXTPEXTRACTIONPFROMPCOMP

Description: MULTISCALE EDGE BASED TEXT EXTRACTION
Platform: | Size: 248832 | Author: bitc | Hits:

[Windows Develop] PDFNet_32Bit_NET1.1-3.5

Description: PdfLib the registered version, you can write code for pdf, and pdf text extraction
Platform: | Size: 18819072 | Author: xiaojian | Hits:

[JSP/Java] joyhtml-0.2.2

Description: 网页正文提取，利用超链接密度算法计算文本块的权重-Web text extraction algorithm using the hyperlink text block density, weight
Platform: | Size: 13660160 | Author: kittyting | Hits:

[Windows Develop] E-mail-address-extraction-tool

Description: 实现从文本中提取出邮箱地址，并有根据域名过滤功能-Extracted from a text e-mail address, and domain filtering based on
Platform: | Size: 37888 | Author: xunjing | Hits:

[Graph Recognize] numbers-and-characters-recognation-

Description: 数字、文字图像识别；程序代码说明： P0801：索书号文字图像分割 P0802：粘连字符切分 P0803：文字识别 P0804：彩色车牌分割 P0805：商标文字分割 Recognition：文字识别的识别子函数 StrDetect01：文字识别的结构特征提取子函数-Numbers, text, image recognition code Description: P0801: Call text segmentation P0802: Touching Character Segmentation P0803: character recognition P0804: color plates separated P0805: Text Segmentation Trademark Recognition: Character Recognition Recognition Functions StrDetect01: Character Recognition The Feature Extraction Functions
Platform: | Size: 206848 | Author: wyuting | Hits:

[Graph Recognize] Text-Extraction

Description: 一篇关于文字识别中文本抽取的英文论文，适用于复杂背景的图像。-Article on the Chinese character recognition in English of the selected papers for complex background image.
Platform: | Size: 989184 | Author: Ivan.Ru | Hits:

[File Operate] Select-Chinese-from-the-web

Description: 网页文本提取，已经经过测试，主要用于垃圾网页过滤等功能-Web text extraction, has been tested, mainly for web filtering spam
Platform: | Size: 21504 | Author: chemingming | Hits:

[JSP/Java] htmlparser

Description: html parser，html文件分析工具。对于文本提取以及再编程具有良好支持性-html parser, html file analysis tool. For text extraction and re-programming with good supportive
Platform: | Size: 431104 | Author: li | Hits:

[File Operate] dataFile

Description: 基于KMP算法的文件文本提取程序，可以从文件中提取想要的文本，进行重组输出致另外一个文件。-The desired text file text extraction program based on the KMP algorithm can be extracted from the file, carry out, even a Low- End restructuring of output caused by another one files.
Platform: | Size: 1271808 | Author: zrt_168 | Hits:

[Other] getword

Description: VB实现Word97-2003二进制文件格式文本提取程序的源代码-VB Word97-2003 binary file format for text extraction program source code
Platform: | Size: 61440 | Author: xuesfh | Hits:

[Other] getwords

Description: VB实现Word97-2003二进制文件格式文本提取程序的源代码，在此提醒：本控件本方法在vc中使用正常，但是在vb中如果doc文件太大(测试过40M的)就会崩溃！同时也期待高手找到解决办法。　　直接复制需要打开的WORD文件的绝对址，就能打开，注意文件不要太大。 -VB Word97-2003 binary file format text extraction program source code, to remind: this control method in vc using normal, but if the doc in vb file is too large (tested 40M) will collapse! Also expect the master to find a solution. Direct copy of the absolute address of the need to open the WORD file, you can open, note that the file is not too much.
Platform: | Size: 625664 | Author: 汪志红 | Hits:

« 1 2 3 45 6 7 8 9 10 ... 17 »

Category

Source Code

Web/Internet

Develop Tools

Document

Other

Search in results

OS

Platform

Language

File Type

Search list